18.415 Project - Using a Count-Min sketch structure for easier analysis of Bloom filter false positive rates
نویسنده
چکیده
The classical proof for Bloom filter false positive rates was shown to be incorrect due to a subtle error regarding independence of events. Although the previously computed false positive rate is asymptotically still correct, it is incorrect for small parameter values and is generally only a lower bound [BGK08]. Indeed, the correct analysis for Bloom filters does not admit a convenient closed form, though efficient computations do exist [CRJ10]. In this paper, we outline the strategy of the new analysis Furthermore, we demonstrate that for large parameter values, we can use a Count-Min [CM05] data structure to asymptotically recover a classical Bloom filter false positive rate using only the naive analysis scheme. This has no practical applications, as the Bloom filter is strictly better in every sense, but this construction may be useful pedagogically in demonstrating the subtleties involved when working with independence. Lastly, exact expected false positive rates are trivially computable for this structure; though the rates are worse than for regular Bloom filters, this might somehow be useful theoretically.
منابع مشابه
Reducing False Positives of a Bloom Filter using Cross-Checking Bloom Filters
A Bloom filter is a compact data structure that supports membership queries on a set, allowing false positives. The simplicity and the excellent performance of a Bloom filter make it a standard data structure of great use in many network applications. In reducing the false positive rate of a Bloom filter, it is well known that the size of a Bloom filter and accordingly the number of hash indice...
متن کاملA Cuckoo Filter Modification Inspired by Bloom Filter
Probabilistic data structures are so popular in membership queries, network applications, and so on. Bloom Filter and Cuckoo Filter are two popular space efficient models that incorporate in set membership checking part of many important protocols. They are compact representation of data that use hash functions to randomize a set of items. Being able to store more elements while keeping a reaso...
متن کاملOptimized hash for network path encoding with minimized false positives
The Bloom filter is a space efficient randomized data structure for representing a set and supporting membership queries. Bloom filters intrinsically allow false positives. However, the space savings they offer outweigh the disadvantage if the false positive rates are kept sufficiently low. Inspired by the recent application of the Bloom filter in a novel multicast forwarding fabric, this paper...
متن کاملAccurate Per-Flow Measurement with Bloom Sketch
Sketch is a probabilistic data structure, and is widely used for per-flow measurement in network. The most common sketches are the CM sketch and its several variants. However, given a limited memory size, these sketches always significantly overestimate some flows, exhibiting poor accuracy. To address this issue, we proposed a novel sketch named the Bloom sketch, combining the sketch with the B...
متن کاملStream Clustering using Probabilistic Data Structures
Most density based stream clustering algorithms separate the clustering process into an online and offline component. Exact summarized statistics are being employed for defining micro-clusters or grid cells during the online stage followed by macro-clustering during the offline stage. This paper proposes a novel alternative to the traditional two phase stream clustering scheme, introducing sket...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013